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I. Energy efficiency in mobile communication networks: A driving source for innovation 

Energy efficiency particularly matters in future mobile communications networks. Key driving factor is the 
growing energy cost of network operation which can make up as much as 50% of the total operational cost 
nowadays [1J. In the context of green information and communication technology (ICT) this has led to many 
global initiatives such as the Green Touch consortiums 




A major source for reducing energy costs is to increase the efficiency of the high power amplifier (HPA) in 
the radio frequency (RF) front end of the base stations |T). However, efficiency of the HPA is directly related 
to the peak-to-average power ratio (PAPR) of the input signal. The problem especially becomes serious 
in orthogonal frequency-division multiplexing (OFDM) multicarrier transmission which is applied in many 
important wireless standards such as the 3GPP Long Term Evolution Advanced (LTE-A). In the sequel of 
this article we refer to it simply as the PAPR problem. The PAPR problem still prevents OFDM from being 
adopted in the uplink of mobile communication standards, and, besides from power efficiency, it can also 
place severe constraints on output power and therefore coverage in the downlink. 

In the past, there have been many efforts to deal with the PAPR problem resulting in numerous papers and 
several overview articles, e.g., Q, 0, However, with the upcoming of novel systems, new challenges 
emerge which have been rarely addressed so far: 1.) the envisioned boost in network energy efficiency (e.g. 
at least by a factor of 1000 in the Green Touch consortium) will tighten the requirements on component 
level so that the efficiency gap with respect to single-carrier transmission must considerably diminish 2.) 
multiple-input/multiple-output (MIMO) multiplicate the problem due to simultaneously control of parallel 
transmit signals particularly when considering a huge number of transmit antennas 3.) multiuser (MU) (and 
multipoint) systems put additional side constraints on the parallel transmit signals which are difficult to 
implement on top of conventional approaches. Furthermore, many of the existing methods are not either 
compatible with relevant standards and/or their prospective performance capabilities are not satisfactory. Yet, 
it is quite safe to say that no standard solution is available. 

In this article, we will argue that, in the light of these challenges, the PAPR metric itself has to be carefully 
reviewed within a much broader scope overthrowing some of the common understanding and results. New 
metrics become more and more important since they enable the system designer to precisely adjust the 
algorithms to meet some given performance indicator. It is expected that such design approach will no longer 
be treated like an isolated problem on physical layer but will affect the design parameters on higher layers as 
well (e.g. resource allocation). For example, it has been discussed in [ 1 1 that from a ICT perspective the system 
throughput should be related to input power rather than output power. In order to capture this paradigm on 
HPA power efficiency level, different metrics are currently used such as total degradation, average distortion 
power and others. However, with respect to algorithm design all these metrics are solely reflected by the 
standard PAPR figure of merit. This argument can also be extended to other situations: it has been recently 
shown in [5 1 that, if the only concern is average distortion power (instead of peak power), then a much less 
conservative design is possible compared to conventional design rules in OFDM transmission. Remarkably, 
such performance limits can be efficiently achieved using de randomization algorithms establishing therefore 
a new powerful tool within the context of the PAPR problem. It is a major aim of this article to review 
and collect exactly those elements in the current literature of which we believe represent the core of a more 
general theory. 

Besides this point of view, it is interesting to apply new signal processing and mathematical concepts to 
OFDM. Compressed sensing J6], Q is a new framework capturing sparsity in signals beyond Shannon's 
sampling theorem and has attracted a lot of attention in recent years. It is based on the observation that a 
small number of linear projections (measurements) for a sparse signal contain enough information for its 
recovery. Compressed sensing can be applied to the PAPR problem because sparsity frequently appears in 

2 see, e.g., the webpage: http://www.greentouch.org 




the (clipped) OFDM signals. There are currently many research efforts in this direction but some challenges 
still remain, such as the degraded recovery performance in noisy environment. The adoption of related 
mathematical concepts such as Banach space geometry J8), J9) complement this discussion. It is outlined 
that these theoretically deeply rooted concepts can help to understand some of the fundamental limits as well 
as to develop new algorithmic solutions for the PAPR problem. 

Summarizing, there is a clear need for a fresh look on the PAPR problem under the general umbrella 
of the metrics theme discussed before which will open up new research strands not yet explored. In this 
article we are going to address and discuss some of the fundamentals, challenges, latest trends, and potential 
solutions which originate from this perspective and which, we believe, are important to come to an innovative 
breakthrough for this long-lasting problem. 

A. Outline and some notations 

The outline of the article is as follows: first we motivate and discuss alternative metrics and correspond- 
ing methodology for the PAPR problem and present several examples. Then, we propose an appropriate 
theoretical framework and unified algorithm design principles for these new paradigms by introducing the 
derandomization principle. In this context, we outline specific challenges imposed by MIMO and MU MIMO 
systems. Next, we discuss capacity issues which establish fundamental limits. Finally, we discuss some of 
the future directions the authors believe are the foundations or at least components of emerging solutions. 

We recall the following standard notations: the frequency-domain OFDM symbol for each antenna (N t in 
total) consists of N subcarriers. The multiplexed transmit symbols C m . n (carrying information and/or control 
data) are drawn from some common QAM/PSK signal constellation and collected in the space-frequency 
codeword C:= [Ci, CVJ where C m :— [C mi i, N ] is the transmit sequence of antenna m. In case 
of a single antenna we write C n := Ci „ and, correspondingly, C.= C\ = [Ci, ...,C m ] T . Given the IDFT 
matrix F:= [ej 2vkn ' ( IN >]o<k<iN,o<n<N the /-times oversampled discrete-time OFDM transmit symbols in 
the equivalent complex baseband at antenna m are given by s m = FC m . The average power of this signal 
might be normalized to one. We define the PAPR of the transmit signal at antenna m as 

PAPR(s m ) :=|KllL- (1) 

Comment on oversampling: Please note that PAPR of the continuous-time passband signal differs roughly 
by 3dB. Clearly, there is also still some overshooting between the samples but due to sufficiently high 
oversampling the effect is negligible. The trade off between overshooting and oversampling is one of the few 
subproblems in OFDM transmission that is well understood. The best known results which hold even in the 
strict band-limited case are given in ifTUll where overshooting is proved to be below 1/ cos(jj). 

II. The design challenge 

In OFDM transmission many subcarriers (constructively or destructively) add up at a time which causes 
large fluctuations of the signal envelope; a transmission which is free from any distortion requires linear 
operation of HPA over a range N times the average power. As practical values of subcarriers are large this 
high dynamics affords HPA operation well below saturation so that most of the supply power is wasted 
with deleterious effect on either battery life time in mobile applications (uplink) or energy cost of network 
operation (downlink). In practice, these values are not tolerable and from a technology viewpoint it is also 
challenging to provide a large linear range. Hence, the HPA output signal is inevitably cut off at some point 
relative to the average power (clipping level) leading to in-band distortion in the form of intermodulation 
terms and spectral regrowth into adjacent channels. The effect is illustrated in Fig. [T] where the distorted 
OFDM signal and corresponding impact on the signal points are depicted. 

The PAPR problem brings up several challenges for the system designer: one challenge is to adjust HPA 
design parameters (HPA backoff, digital predistortion) in some specific way so that power efficiency is traded 



against nonlinear distortion which effects the data transmission on a global scale. To capture this trade off by 
a suitable metric on the level of the HPA is far from clear yet. Special HPA architectures at component level 
such as Doherty ifTTI and others can help to improve on this trade off. We also mention that other design 
constraints such as costs might prevent specific architectures fl2l . 

A second challenge is to process the baseband signal by peak power reduction algorithms in such a way 
that the key figures of merit in the before mentioned trade off are improved. This alternating procedure makes 
apparent that the PAPR problem involves joint optimization of HPA, predistortion and signal processing unit. 
This interplay has only been marginally addressed so far let alone in the context of multiuser systems equipped 
with multiple antennas such as LTE-A. 

In the following we discuss some potential metrics that can be used in the optimization. 

III. The right paradigm? Alternative metrics for PAPR 

Classically, in OFDM transmission the PAPR of the transmit signal is analyzed and minimized by applying 
transmitter-side algorithms. Meanwhile it has been recognized that it may be reasonable to study other 
parameters as well. Especially when aiming at minimizing the energy consumption of the transmitter including 
the analog front end or when operating low-cost, low-precision power amplifier — sometimes referred to as 
"dirty RF" [12| — potentially other signal properties need to be controlled. 

Let us present some illustrative example first. Suppose, we are interested in the clipped energy instead of 
the PAPR (we give some justification in terms of capacity below for this). Naturally, since the total energy 
is approximately one the clipped energy is finite as well but when N increases the required clipping level 
for asymptotically zero clipping energy might be of interest for design purposes. Clearly, no clipping at all 
is trivially sufficient but, surprisingly, it is actually not necessary: it is proved in [5| that clipping level can 
be adjusted along the log log (N) law so that it is practically almost constant. This stands in clear contrast 



to the log (N) PAPR scaling discussed in Sec. IV 



Subsequently, some alternative metrics replacing the PAPR value in specific situations are briefly summa- 
rized. 

• Of course, the PAPR has still its justification. As it relates peak and mean power the PAPR is the 
adequate metric for quantifying the required input power backoff of the power amplifier. When using 
higher-order modulation per carrier the energy per OFDM frame is no longer constant and hence average 
power fluctuates. In this case, the peak power is as suited metric. Both metrics are well-suited measures 
if purely limiting effects (modeled, e.g., as soft limiter) should be characterized. Since PAPR is random, 
we are also interested in the complementary cumulative distribution function (CCDF) F(x) and other 
characteristic figures such as the mean E{-} etc. 

• Besides looking at the transmit signal itself, in many situations the impact of the nonlinear power 
amplifier to system performance is of interest. One possible approach is to quantify the nonlinear 
distortions caused by a particular power amplifier model. The signal-to-distortion ratio (SDR) [15|, 
lfl6ll or error vector magnitude (EVM) 1131 . Ifl4l capture the in-band distortion of the OFDM signal 
and are immediately related to error rate of uncoded transmission. Both, SDR and EVM and their 
interdependence are illustrated in Fig. [T] 

• Within 3GPP, a power amplifier model which cause non-linear distortions according to the third power 
of the RF transmit signal has become popular (so-called cubic polynomial model) [T7j. In this case, 
the cubic metric (CM) measuring the mean of time domain sample energy to the third power is well 
adapted to the specific scenario |fl~8), fl9l . However, any additional clipping is not included here. 

• CM metric is a special case of the amplifier oriented metric (AOM) defined in [20|, [19|. AOM measures 
the mean squared absolute difference of desired and distorted HPA output. Here, the HPA output is 
calculated based upon some model such as the mentioned cubic polynomial model (or the well-known 
Rapp model etc.). 
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Fig. 1. Illustration of distorted OFDM signal (in time domain, clipping level=3.5 dB) and corresponding impact on the subcarrier 
signalling points (in frequency domain): the distortion signal is typically a sequence of clips with clip duration tx,t3,— The SDR 
metric then relates the mean useful signal energy to the mean distortion energy while EVM collects the mean of sum of squared errors 
di,d,2, ... in the data sequence (due to the very same time domain distortion). In case of Nyquist-sampling both are actually equal 
(subject to a scaling factor) 1131 . 1141 . 



• A much severe problem in communication systems is that nonlinear devices cause a spectral widening 
and hence generate out-of-band radiation. In order not to violate spectral masks imposed by the regulatory 
body, a metric quantifying the out-of-band power or the shoulder attenuation is desirable EOll . In this 
field, significant work has to be done. 

• When applying strong channel coding schemes — which nowadays is the state of the art in OFDM 
transmission — SDR or EVM are no longer suited performance characteristics. Instead, the end-to-end 
capacity of the entire OFDM scheme including the nonlinear devises matters. Unfortunately, neither 
the capacity of the continuous-time peak-power limited additive white Gaussian noise (AWGN) channel 
itself, nor the capacity of OFDM over such channels are known. Moreover, when using (as often done) a 
statistical model for the behavior of the nonlinear device, only lower bounds on the capacity are obtained 
as the statistical dependencies within one OFDM frame are not exploited. 

However, recent work lETl . [22 1 indicates that clipped OFDM performs (almost) the same as undipped 
OFDM with signal-to-noise ratio reduced according to the clipping power loss. Hence, the main source 
of the loss are not the introduced distortions or errors but simply the reduced output power. This, in 
turn, leads to the conclusion that a suited metric for capacity maximization is simply the average power 
of the power amplifier output signal. In this case, unfortunately, the generation of out-of-band radiation 
is not penalized. 

• The symbol error rate (SER) is a related measure which has been directly applied to peak power control 
algorithms in l23ll . 

• In future applications, more than a single signal parameter will have to be controlled. E.g., the capacity 
should be maximized but at the same time the out-of-band power should be minimized. Consequently, 
suited combinations of metrics capturing the desired trade off are requested. 

Comment on Gaussian approximation: Noteworthy, many metrics have been analyzed in the past with the 
help of the Gaussian approximation. This, however, is not in all cases a feasible path. It is true that as N 
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Fig. 2. Palm distribution of clip duration of an OFDM signal with N = 2048 and different clipping levels in [dB]. Note that the 
theory (dotted) is valid only for large clipping levels but fits relatively well also for low clipping levels. 



gets large the finite-dimensional distributions converge and since the signals are band-limited also the process 
itself. However, we mention that it is a local property, valid only for any finite interval. For example in Fig. 
[2] the empirical CCDF of the clip duration (see the illustration in Fig. [TJ is shown for OFDM signals and 
compared to a widely unknown result for envelopes of Gaussian processes (and their Hilbert transforms) for 
large clipping levels J24|^] It is seen that simulation and analysis match very well ||231 . On the other hand, 
metrics such as EVM 1 14| and SER 11251 have been shown not to match well. 



IV. Approaching the log (N) barrier: Derandomization 

In this section we discuss several fundamental principles for the peak power control problem. We believe 
that all of them actually connect to a broader theory general enough to capture alternative metrics as well 
and will open the door for new, provably more efficient algorithms. 

A. The LDP 

By analyzing PAPR of multicarrier signals one faces a fundamental barrier which to overcome seems quite 
challenging: the log (N) barrier (recall: N is the number of subcarriers). In fact, it is an exercise in large 
deviations to show that multicarrier signals with statistically independent subcarriers have PAPR of log (N) 
in a probabilistic sense [26], l27ll . Il28l . This means that with very high probability the PAPR lies in an open, 
arbitrarily small interval containing log (N): this is what we call the large deviation principle or in short 
LDP. 

Implicitly, LDP affects the performance of many peak power control schemes. The LDP has been known 
since long in the context of random polynomials but in the OFDM context the most general form is due 
to ||29ll where it is shown that as long as N is large enough and the subcarriers are independent that the 
following inequality is true: 

log [log (AQ] | 



Pr- 



VPAPR - A/log (N) 



> 1- 



< r (2) 

" [log(A0] 2 ^ 

Here, 7 > | is some design parameter which trades off probability decay over deviation from log (N). While 
the analysis is tricky when it comes to show that PAPR is not below log (N) with high probability, it is a 

3 To be specific, Ref. [24] derives the so-called Palm distribution of the clip duration which describes the statistical average after an 
upcrossing of the Gaussian process. 



Fig. 3. Illustration of the virtue of the LDP: [from right to left] CCDF of PAPR of a.) uncoded data b.) SLM and c.) derandomization. 



surprisingly easy task to show the converse: standard inequalities (such as Chemoff bounds) or any other 
Markov-style bound do the job. Some can be exploited for algorithm design. 

The inequality states that PAPR concentrates more and more around the value log (N) which establishes 
therefore an important theoretical scaling law. The proof is technical but the result might be surprising since 
1) the factor before the logarithmic term is exactly unity and 2) the scaling law differs from the well-known 
law of iterated logarithm which would suggest only doubly logarithmic scaling. 

The LDP contains some valuable illustrative aspects which we are going to reveal now. The LDP in eqn. 
|2) is somewhat unaccessible and shall be rewritten in the more convenient form: 

log (F c (x)) = [log (N) + O (log [log (AO]) - x}- (3) 

where we used the order notation O (•) and the definition [x] := min(0,x). Disregarding the order term 
O (log [log (N)}) we have the interpretation that the probability decreases linearly on a logarithmic scale 
from some cut-off point log (N) on which is illustrated in Fig. [5] The proximity to filter design terminology 
is intended and it makes obviously sense to speak of a pass band and a stop band in the figure. Comparing 
this to the standard analysis where statistical independence and Nyquist sampling is assumed gives 

log (F c (x)) = [log (N) -x]~ (4) 

where the order term is missing. Hence, we conclude that a careful non-Gaussian analysis for continuous -time 
OFDM signals entails an error of at most O (log [log (N)}). 

The LDP is very useful for assessing the performance of peak-power control schemes. Before we show 
this we might ask why this concentration happens? Let C\, C-2, Cjv be a (data) sequence of independent 
random variables; when estimating PAPR without a priori information the expectation is the best possible 
choice. Using successive knowledge of already fixed data we have the following estimations: 



y = E{PAPR(s)} (5) 

yi =E{PAPR(s)|Ci} (6) 

y 2 = E{PAPR(s)|C 1 ,C 2 } (7) 

y N =E{PAPR(s)\C 1 ,C 2 ,...,C N } (8) 



It can be shown that this process establishes a Martingale with bounded increments — Ui-i \ from which it 
follows (see 11301 ) measure concentration of the PAPR around its average via the Azuma-Hoeffding inequality 



Fig. 4. A general model for peak power control: each cell contains the same information; sets with large PAPR (dotted areas) are 
mapped to sets where PAPR is below some threshold (grey-shaded areas). 

or McDiarmid's inequality. Another approach used in l30l for proving measure concentration of the PAPR 
around its median is based on the convex-hull distance inequality of Talagrand. The tails of the concentration 
inequalities are even exponential then. Let us now apply the LDP within the context of peak power control. 

B. Multiple signal representation and partitioning 

The basic principle for most of the peak power control schemes is multiple signal representation which 
roots in the classical methods selected mapping (SLM) ||3T| and partial transmit sequences (PTS) [32], |33|. 
The idea is simple: instead of transmitting the original OFDM data frame multiple redundant candidates are 
generated and the "best" candidate is singled out for transmission. By using suitable transforms or mappings 
the main goal is to achieve statistical independence between the candidates' metrics. Clearly, instead of 
PAPR, also alternative metrics can be used in the selection process ll20l . fl9l . SLM and PTS are similar, the 
difference between SLM and PTS is that the mappings are applied to a subset of the data frame. 

For SLM many transforms have been proposed: in the original approach the data frame is element-wise 
multiplied by random phases; other popular approaches include binary random scrambling and permutation 
of the data (see ref. in 1191 : here is it also where side transmission is discussed). Similar for PTS, random 
phases have been used as well. While typically a full search is carried out efficient algorithms to find the 
phases have been proposed. An exhaustive list can be found in 0. 

SLM can be analyzed within the context of LDP. The transforms define U alternatives each assumed with 
independent PAPR. Clearly this independence assumption is crucial: it might be argued that it has to hold 
for the PAPR only but the model clearly fails when the number of alternatives is large. By exploiting the 
LDP we have simply then 

log (F c (x)) = U [log (N) — x]~ (9) 

so that the decay is U times faster as depicted in Fig. [3] A similar analysis can be carried out when the 
selection is done directed or over extended time lfl9l . Note that in principle PTS can be analyzed as well; 
however since the transform is on subsets of the data the independence assumption is far more critical. Another 
main problem so far is that side information is treated separately and not within the same communication 
model. 

A better model is complete partitioning of the set of transmit sequences. The idea is illustrated in Fig. 
[4] Suppose that the transmit sequence belong to some set which is partitioned into many cells all of them 
containing the same information. Note that if the actual cell selection is required at the receiver for decoding, 
side information is generated. This side information belongs in our general model to the transmit sequence 
itself and must be specially protected. This can be done via an embedded code which is decoded before or 
after the actual information decoding procedure 0. Let us mark the subsets where PAPR is below some 



threshold: the reasoning is that by the mapping of codewords from one cell into another, sets with larger 
PAPR should be mapped to a marked subset by at least one mapping which will ensure peak power below 
the threshold. Obviously, the definition of such a mapping will determine the performance of the scheme. 

One of the simplest examples is when the data is over some constellation and side information is encoded 
into a sequence of BPSK symbols: each sequence defines a specific BPSK vector determining the sign vector. 
Both modified information and side information sequence define the transmitted codeword. This method is 
called sequence balancing (34). It is characteristic for this method that correlation is inserted in the stream 
by using suitable binary codes. We will call this the binary correlation model. Noteworthy, if the side 
information is purely redundant the method reduces to tone reservation [4J. Moreover, if the selection defines 
phase relations between partial sequences then it is a version of PTS |32|. Related approach is also Trellis 
shaping E3, El. 

Sequence balancing using binary codes can achieve (even though easily generalized) already a sufficient 
fraction of the theoretically possible performance gain: the main required property of the set of binary vectors 
is their ability of as many sign changes as possible over any subvector which is called the strength of the 
code 1 34 1 . Many binary codes have this property and are thus suited for this procedure. The strength is related 
to the dual distance. It can be shown that if the strength grows as log (N) then PAPR is below log (N) for 
large N. Unfortunately, similar to SLM and PTS, the number of candidates grows as well. 

There are other methods which use partitioning as well such as tone injection [4| where the constellation is 
artificially extended or translates of codes [36]. Schemes such as active constellation extension [37 1 introduce 
redundancy as well but can be continuously formulated so that other methods such as convex optimization 
can be applied. 

All discussed approaches assume to run a full cell selection search which is too complex in many situations. 
A better approach is discussed next. 

C. Derandomization of choices 

The LDP provides a method to circumvent full search by assuming a suitable underlying probability 
model for the cell selection. By derandomizing the cell selection one can easily devise suitable algorithms 
guaranteeing a PAPR reduction very close to the \og{N) barrier [38], [29], l39l . The basis algorithm goes 
back to Spencer |40| who called it the probabilistic method. 

The derandomization method is best explained along an example: consider again the binary correlation 
model where any possible sign change for some information sequence C is allowed. Denote this sign vector 
by A := [Aq, . . . ,An-i] and the resulting transmit sequence by AoC (respectively saoc)- Suppose that 
all the sign changes happens at random with equal probability and each sign change is independent. As for 
the LDP, define the random variables (Aq, . . . .A*^) := E {PAPR(s j4oC ) \ Aq,..., A*_A. Then we can 
mimic the steps <|3j-([8j and successively reduce randomness by applying: 

A* := arg min j/ 4 (A%, . . . , A*_ x , A l ) 

ai 

By the properties of (conditional) expectations 

and finally PAPR(s) < yo since yjv f^O' • • • > ^iV-i) * s sml ply a non-random quantity. Finally, by the 
LDP yo < log (N) for N large enough. Since the expectation are somewhat difficult to handle instead of the 
PAPR (s) typically the set function and corresponding bounds have been used. For example, Chernoff bounds 
have been used in [38 1, [29|, [41 1 showing good performance and low complexity. Moment bounds with better 
tail properties have been used but the complexity is higher ll42l . Performance results of the derandomization 
method are reported in Fig. [5] comparing sequence balancing (Sec. |IV-B] > with and without derandomization. 
The benefit of the derandomization method is clearly observed and corresponds to more than 4 dB gain in 
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Fig. 5. The figure compares the sequence balancing method of Sec. |IV-B"] in terms of the CCDF of PAPR for a 128 subcarrier OFDM 
system using a.) derandomization algorithm where all possible subcarrier sign changes are allowed and b.) codes of given strength with 
randomly chosen balancing vectors. For the code a strength 10 dual BCH code with only 18 redundant subcarrier is applied. Please 
note that the simulation matches very well the theory predicting cut off point log (128) ~ 6.8 dB. 



HPA backoff (at 10~ 3 outage probability) which mimics exactly the results of the LDP analysis (Sec. IV-Ai. 
However, it comes at the cost of 1 bit/dimension rate loss. The relevant trade off between rate and PAPR 
have been rarely investigated so far. 

Combining the derandomization method with partitioning yields several improved algorithms for standard 
problems. For example the PTS method has been applied in ll43l . It is proved that with derandomization 
method PTS can achieve r log (N) where r is the percentage of partial transmit sequences related to N. The 
tone reservation method has been treated using derandomization in ||29l (see also Sec. VII-B4| >. Implicitly, 



derandomization has been used in [36] to show that PAPR of some translate of a code C is below \C\ log (N). 
Related derandomization algorithms have been used in [39] adopting the so-called pusher-chooser game 
from [40|. The idea is to choose Z p -norms and prove a recursive formula similar to the Chernoff method. The 
approach can be generalized to alternative metrics if appropriate bounds are available: in ll23l the SER has been 
reduced using derandomization showing that 1 ° s ^ jv - > clipping level is sufficient asymptotically for zero error 
probability (instead of log (N)). Furthermore, in the recent paper [5] zero clipped energy is asymptotically 
achieved with clipping level log log (N). 

There is still plenty of room for improvements, e.g. by considering correlations between different samplings 
points and incorporating other metrics as well Q, ll23l . It has not been noticed yet that this field is particularly 
underdeveloped and bears great potential for significant improvements of currents systems. Another point to 
be improved is the rate loss imposed by the current methods. 

V. Additional resources: MIMO and multiuser systems 

While utmost beneficial in terms of spectral efficiency MIMO systems complicate the PAPR problem: in 
single-antenna systems the PAPR (or other metrics) of only one transmit antenna has to be controlled. In 
the MIMO setting a large number of OFDM signals are transmitted in parallel and typically the worst-case 
candidate dictates the PAPR metric (e.g., due to out-of-band power) |[T9"1 . 

As a consequence, PAPR reduction methods tailored to this situation should be utilized, instead of per- 
forming single-antenna PAPR reduction in parallel. Multi-antenna transmitter provide additional degrees of 
freedom which can be utilized beneficially for PAPR reduction — the full potential has not yet been explored 
in literature. Basically, the peak power can be redistributed over the antennas. By this, MIMO PAPR reduction 
may lead to an increased slope of the CCDF curves (cf. Fig p]l, i.e., the probability of occurrence of large 
signal peaks can significantly lowered compared to single-antenna schemes. This effect is similar to that of 
achieving some diversity gain. 



When studying MIMO PAPR reduction schemes two basic scenarios have to be distinguished: on the 
one hand, in point-to-point MIMO transmission joint processing of the signals at both ends (transmitter and 
receiver) is possible. On the other hand, in point-to-multipoint situations, i.e., multi-user downlink transmission 
joint signal processing is only possible at the transmitter sid^] This fact heavily restricts the applicability of 
PAPR reduction schemes. 

For the point-to-point setting, a number of PAPR reduction schemes have been designed, particularly 
extension of SLM. Besides ordinary SLM (conventional SLM is simply applied in parallel) simplified SLM 
(the selection is coupled over the antennas) has been proposed in [44]. Directed SLM 1451 is tailored to the 
MIMO situation and successively invests complexity (test of candidates) only where PAR reduction is really 
needed. 

It might be sufficient that the PAPR stays below a tolerable limit, determined by the actual radio frontend. 
Here, complexity can be saved if candidate generation and assessment is done successively and stopped if 
the tolerable value is reached. Interestingly, the average number of assessed candidates is simply given by 
the inverse of the cdf of PAPR of the underlying original OFDM scheme. Noteworthy, for PAPR = log (AT) 
and reasonably large number N of carriers, average complexity per antenna is in the order of e = 2.718 . . . 
(Euler's number) l46l . Alternative metrics have been used in l47ll . 

Compared to point-to-point MIMO systems, PAPR reduction schemes applicable in point-to-multipoint 
scenarios (multi-user downlink) are a much more challenging task. Since no joint receiver-side signal pro- 
cessing is possible, at the transmitter side in candidate generation only operation are allowed which can 
individually be reversed at each of the receivers. Among the SLM family, only simplified SLM can be 
used here. However, in this situation the usually present transmitter side multi-user pre-equalization can be 
utilized for PAPR reduction. Applying Tomlinson-Harashima precoding the sorting in each carrier can be 
optimized to lower PAPR at almost no cost in (uncoded) error rate fHfl . The same is true when applying 
lattice-reduction-aided pre-equalization. Here the unimodular matrices (describing a change of basis) can be 
optimized to control the properties of the transmit signals [48]. There are also links from MIMO PAPR 



reduction and derandomization to code design (cf. Sec. VILA I. 



VI. Going beyond: OFDM Capacity Fundamentals 

While the capacity of the discrete-time peak-power-constraint channel is known and computable, the 
capacity of the OFDM peak-power-constraint channel is still an open problem ll49ll . |5Q| . The problem is 
indeed intricate as it has been unknown until very recently that there are exponentially many OFDM signals 
with constant PAPR (cf. Sec. |VII-A| i. However, no practical encoding scheme is known which comes even 
close to this merely theoretical result. From this perspective the capacity problem awaits a more thorough 
theoretical solution. 

Recent work ETt ll22l on practical schemes indicates that clipped OFDM performs (almost) the same as 
undipped OFDM with signal-to-noise ratio reduced according to the clipping power loss. The main source 
of the loss are not the introduced distortions or errors but simply the reduced output power. Given the OFDM 
frame in frequency domain C = [Ci, . . . , Cm], via IDFT the time-domain samples s[k] are calculated. These 
samples then undergo clipping in the amplifier frontend. As usually the clipping behavior can be described 
by a nonlinear, memory less point symmetric function g(x) (with g(x) < x, x > 0, applied element- wise to 
vectors). In frequency domain, the clipped signal is given by Z = DFT{g(IDFT{C})}. Note that clipping 
is a deterministic function and a one-to-one relation between the vector C of undipped symbols and the 
vector Z = \Z\, . . . , Zn\ of clipped ones exist. Assuming an AWGN channel, at the receiver side the vector 
Z, disturbed by additive white Gaussian noise is present. In case of intersymbol-interference channels, the 
symbols Z n are additionally individually scaled by the fading gain at the respective carrier. 

4 In the multipoint-to-point scenario (multiple-access channel) no joint optimization of the transmit signals can be performed, hence 
this case is not amenable for MIMO approaches. 




Fig. 6. Top row: visualization of the effect of clipping on the set of possible OFDM frames (here: N = 3, 2-PAM per carrier). Bottom 
row: visualization of predistortion via an algorithm for maximizing the power of the signal after clipping. 

This clipping behavior can be visualized for N = 3 and 2-PAM per carrier, see top of Fig. [6] The initial 
hypercube with vertices given by all possible vectors C is distorted. However, the attenuation of the useful 
signal (the vector Z has lower energy) will be the dominating effect over deformation. This, in turn, leads 
to the conclusion that a suited metric for capacity maximization is simply the average power of the power 
amplifier output signal. 

A possible strategy is shown on the bottom of Fig. [6] A signal shaping algorithm may adjust the signal 
points in 27V-dimensional real-valued space such that after clipping the set of all possible OFDM vectors in 
frequency domain forms (approximately) an hypercube with energy close to that of the initial constellation. 
First work on using the strategy of active constellation extension for achieving is goal has been presented 

EE). 

VII. Emerging solutions: An open field 

A. New Trends in Code Design 

Jones et al. |52| were the first to describe block coding schemes in the present context. This framework has 
been put in systematic form by observing the connection of cosets of Reed-Muller codes and complementary 
sequences ll53l . 11541 . Unfortunately, these approaches have limited potential for modern OFDM systems due 
to their limited coding rate. The fundamental trade off between different code key properties such as rate, 
PAPR etc. was explored and discussed in [55 1. More recent ideas use the idea of sequence balancing and 
code extensions in form of erasure coding in other domains (e.g. MIMO ll56l ) to tackle the PAPR problem 
with an inner code, while error correction still is done via an outer code [34|. 

1 ) Codes and sequences with low PAPR: Though most of multicarrier signals of length N have PAPR 
close to log (N), it turns out that signal with constant PAPR are not so rare. Using a remarkable result of 
Spencer [57 1 it is possible to show that the number of such BPSK modulated signals is exponential in N. 
Namely, there are at least (2 — Sk) n such signals with PAPR not exceeding K, where 6k is a constant 
depending on K and tending to zero when K grows. It is an open question how to generate many signals 
for given K. 
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A lot of research was devoted to describing signals with low values of PAPR. For BPSK modulated signals 
an extreme example is provided by Rudin-Shapiro sequences defined recursively from Po = Qo = 1> an d 

These sequences of length being a power of 2 have PAPR at most 2. More general examples of sequences 
with PAPR at most 2 arise from Golay complementary sequences. Two sequences constitute a complementary 
pair if the sum of the values of their aperiodic correlation functions sum up to zero. Many methods are known 
for constructing such sequences, see 12 Section 7.6]. Notice that it is not known if BPSK modulated signals 
can have PAPR less than 2. However, if one increases the size of multiphase constellations to infinity there 
exist sequences with PAPR approaching 1 J21 Theorem 7.37]. For constructions of multiphase complementary 
pairs from cosets of Reed-Muller codes see [58 1 and references there. PAPR of m-sequences and Legendre 
sequences is discussed in 10 Sections 7.7 and 7.8]. 

Often we need to know the biggest PAPR among sequences belonging to a code. Bounds on PAPR of 
codes on sphere as a function of their sizes and minimum Euclidean distances was studied in 1531 . A relation 
between the distance distribution of codes and PAPR was derived in l29l . This yielded bounds on PAPR of 
long algebraic codes, such as BCH codes. Analysis of PAPR of codes with iterative decoding, for instance 
LDPC codes, remains an open problem. PAPR of codes of small size was studied in 11531 . In particular, it 
was shown that PAPR of duals of length N BCH codes are at most log 2 (N). As well bounds on PAPR 
of Kerdock and Delsarte-Goethals codes were derived. In |[34l it was shown that in every coset of a code 
dual to BCH code with the minimum distance of log (N) exists a code word with PAPR at most log (A^). 
At the same time, this leads to a very modest rate loss. Still, constructing codes having low PAPR and high 
minimum distance seems to be a challenge. 

Computing PAPR of a given code is a computationally consuming problem. If a code has a reasonably 
simple maximum-likelihood decoding algorithm it is possible to determine efficiently its PAPR 1591 , lISOl . 

In 1 56 1 off-the-shelf channel codes, in particular Reed-Solomon (RS) and Simplex codes are employed to 
create candidates, from which, as in SLM, the best are selected. The codes are thereby arranged over a number 
of OFDM frames rather than over the carriers. Such an approach is very flexible as due to the selection step 
any criterion of optimality can be taken into account. Moreover, instead of applying the approach to the 
MIMO setting, it can also be used if block of temporal consecutive OFDM frames are treated jointly. The 
method is illustrated in Fig. [7] 

2) Constellation shaping: In constellation shaping, we have to find a constellation in the A-dimensional 
frequency domain, such that the resulting shaping region in the time domain has low PAPR. At the same 



time we would like to have a simple encoding method for the chosen constellation. Such shaping based on 
Hadamard transform was considered in AST! . The main challenge in constellation shaping is to find a unique 
way of mapping (encoding) and its inverse (decoding) of reasonable complexity. The suggested approach 
in 11621 . ll6D is based on a matrix decomposition. Though the simulation results are quite promising, the 
implementation complexity still seems to be far from being affordable [63], l62ll . 



B. Banach space geometry 

An interesting new approach to the PAPR problem is that of using Banach space geometry. Banach space 
geometry relates norms and metrics of different Banach spaces to each other. For example, a question that 
often arises is: assume a Banach space with unit norm ball Bi and another Banach space with ball B 2 ; both 
spaces are of finite, possibly different dimension. What is the relation between the norms if the projection 
of one ball covers the other ball? Furthermore, what is the dependence of this relation on the dimensions? 

Interestingly, these relations turn out to be useful for the PAPR problem in several other ways depending 
on the underlying Banach spaces as the following examples show: 



1) Alternative orthonormal systems: Kashin & Tzafriri's theorem: In Sec. IV it was shown that OFDM 
has unfavorable PAPR of order log (N) if N gets large. One might be inclined to ask if this is an artifact 
of the underlying orthonormal signaling system. The answer is actually no with the implication that OFDM 
plays no specific role among all orthonormal systems. Already in [64] it was shown that worst PAPR is of 
order N regardless of the signaling system (multicarrier CDM etc.). But even if we consider not the worst 
PAPR but look at the PAPR on average the situation does not get better. In l65l , Kashin & Tzafriri proved 
that for any orthonormal system on a given finite time interval the expectation of PAPR is necessarily of 
order log (N). Again, changing the signaling is not beneficial in terms of PAPR. The underlying mathematical 
problem is that of estimating the supremum norm of a finite linear combination of functions weighted with 
random coefficients both constrained in the energy norm. 

2) Is PAPR of single-carrier really much better?: It is common engineering experience that single-carrier 
has better PAPR than multicarrier. But it might be worth raising this question again within the context of 
upcoming technological advances (LTE-A etc.) which operate much closer to the Nyquist bandwidth and, 
moreover, use different modulation and coding schemes. Let us formalize this question. 

Suppose, we send a transmit sequence C\, Cn and use a band-limited filter to generate the continuous- 
time signal (bandwidth is set to 7r for simplicity). The transmit signal can be described by 

sin7r (t — ti) 



with sampling points i; € Z. Naturally, band-limited signals of this form have very different PAPR behavior 
compared to OFDM since, obviously, if the coefficients are from some standard modulation alphabet, the 
signal is nailed down to some finite value at the sampling point independent of N. However, within the 
sampling intervals (on average) large PAPR could actually occur. Noteworthy, the worst case is growing 
without bounds linearly in N. 

Surprisingly, the exact answer to this problem has not been explored until very recently [66| which is 
basically a result on large deviations in Banach spaces. It is proved in [66 1 that such bad PAPR cannot 
actually happen and that there is a constant cq > such that: 

E(PAPR) < Cq log log (N) 

But we also see the catch here. Modern communication systems use higher modulation sizes and in that 
case the influence of the data becomes dominant if the distribution becomes Gaussian like. In that case we 
approach the log (AT) again. 

There is some interesting connection of the PAPR problem to the Hilbert transform context: since in many 
standard communication models, e.g. in Gabor's famous Theory of Communication [67 1, [68 1, the transmit 



signal is a linear combination of a signal and its Hilbert transform, properties such as PAPR in the transform 
domain become more and more important. Initiated by early works of Logan [69 1 who investigated the Hilbert 
transforms of certain bandpass signals it was recogniced not until very recently l70l . im~ll that the results 
are fragile for wideband signals containing spectral components in an interval around zero frequency. Then, 
in general, the domain of the Hilbert transform must be suitably extended; further, examples of bandlimited 
wideband signals are provided where the PAPR grows without bounds in the Hilbert transform domain |72|. 
Hence, for certain single-carrier analytic modulation schemes the transmit signal has to be shaped very 
carefully. 

3) Overcomplete expansions with uniformly bounded PAPR: While the result for arbitrary orthonormal 
systems appears rather pessimistic there is a possible solution in the form of frames. Frames are overcomplete 
systems of vectors in R n , n < N. Let us denote this description by U= [ui, ...,u] N £ M. nxN , N > n. Then, 
if the rows are independent there is cce M. N so that 

y = U T x (10) 

for any y = M. n and the elements of U are a frame. If U T U = I n , where I n is the identity matrix, then 
it is called a tight frame. In seminal work Kashin [8| interpreted the mapping (lOi as an embedding of the 
Banach space with supremum norm to the Banach space with standard euclidean norm I2 and asked for 
the growth factor K (A) > 0, A := N/n, between the two norms when the I2 unit ball in W 1 should be 
covered by the unit ball in M. N . Such representations are called Kashin representations of level A l73l . 

Clearly, if N = n then K (A) = \/~N. However, if N > n (overcomplete expansion) then Kashin proved 
that there is a subspace in M. generating by a frame U such that A is given by: 

A / A ^ 1/2 



K(X):=c^— lo g[ l + — )) ,c 1>0 

Hence, the K (A) is uniformly bounded in n if A > 1 is fixed. Good estimates of the constant c\ > are 
not known [ 73 1 . 

This intriguing result has been applied in the PAPR context in J73 1 and the implications for peak power 
reduction are immediate. The matrix U can be taken as a precoding matrix for classical OFDM transmission 
and achieve uniformly bounded PAPR. Unfortunately the construction of the optimal subspace is not known 
|73|. Kashin representations exploiting the uncertainity principle of random partial Fourier matrices are 
presented in 11741 . 

4) Tone reservation and Szemeredi's theorem: One of the oldest but still very popular scheme is tone 
reservation [75 1. But, despite its simplicity, many questions involved are still open which does not come by 
coincidence: recent work in [76] has analyzed the performance of this method and uses an application of the 
Szemeredi's famous theorem about arithmetic progressions (Abel price 2012). 

Recalling the setting where a subset of subcarriers is solely reserved for peak power reduction the challenge 
is to find for a given set of transmit sequence a subset and corresponding values such that the PAPR is 
reduced to the most possible gain. Until now, achievability and limits are not known (except for simple 
cases). Therefore, there is some incentive to look at this problem from a new perspective. Ref. [76] has 
analyzed the case where the compensation set is arbitrary but fixed. In this typical case it is proved that the 
efficiency of the system, i.e., the ratio of cardinality of information and compensation sets must decrease to 
zero if the peak power is constrained independent of the subcarriers. The technique that is used is to show 
necessary assumptions on the relations of unit spheres in the Banach spaces. This relation is shown not to 
hold asymptotically for sets with additive structure. However, Szemeredi's theorem states that such sets are 
included in every subset of cardinality 5N where S > 0. In fact such arithmetic progressions induce signals 
with bad PAPR behavior naturally to be excluded by the method. The theorem shows that this is not possible. 

In extended work 11771 also other families of orthogonal signalling such as Walsh sequences are analyzed 
all of them showing basically the same disencouraging result regarding the system's efficiency. This leads to 



the conjecture in ifTTIl that all natural orthogonal signalling families have this behavior. 



C. Compressed sensing 

Compressed sensing J6), Q is a new sampling method that compresses a signal simultaneously with data 
acquisition. Each element of the compressed signal or measurements consists of a linear combination of the 
elements in the original signal and this linear transformation is independent of instantaneous characteristics 
of each signal. In general, it is not possible to recover an unknown original signal from the measurements in 
the reduced dimension. Nevertheless, if the original signal has sparsity property, its recovery can be perfectly 
achieved at the receiver. Since sparsity frequently appears in the PAPR problems of the OFDM systems, 
compressed sensing can be a powerful tool to solve these problems. 

Compressed sensing can be regarded as minimizing the number of measurements while still retaining the 
information necessary to recover the original signal well (i.e. beyond classical Nyquist sampling). The process 
can be briefly illustrated as follows. Let / denote a signal vector of dimension N and g be a measurement 
vector of dimension M with M < N obtained by g = where <& is called sensing matrix. At the 
transmitter, sampling and compression are performed altogether by simply multiplying €> by / to obtain g. 
At the receiver, if / is an S'-sparse signal, which means / has no more than S nonzero elements, it is shown 
in Q that the exact / can be obtained from g by using 1% minimization, that is, 

min H/lli subject to g = $f (11) 
/ 

as long as €> has some good property, which is called restricted isometry property (RIP). For some positive 
integer S, the isometry constant 5$ of a matrix <I> is defined as the smallest number such that 



(1-Ss)\\f\\l < < (1 + ^)11/ 



holds for all S'-sparse vectors /. Under RIP with 82s < v2 — 1, (111 gives the exact solution for / 
|78|. This recovery method using ^-minimization is called basis pursuit (BP) f79l , which requires high 
computational complexity. Many greedy algorithms [80|, [81], |82|, [83| have been developed to reduce the 
recovery complexity. 

In many applications of compressed sensing such as communication systems, it is required to recover / 
from the corrupted measurements g' = g + z, where z is a noise vector of dimension M. For this, recovery 
algorithms such as basis pursuit denoising |f79ll . Lasso (84], and their variants have been developed while the 
existing recovery algorithms can also be used. However, these algorithms do not still show good performance 
enough to be adopted in wireless communication systems which usually require very low error rate even in 
severely noisy environments. 

Related to PAPR problems, the properties lying in the compressed sensing such as sparsity, RIP, and 
recovery algorithms can be utilized in many PAPR reduction schemes. In 1 85 1 and 1 86 1, a new tone-reservation 
scheme is proposed, which is different from the existing tone-reservation J4] in that it provides a guaranteed 
upper bound for PAPR reduction as well as guaranteed rates of convergence. This scheme exploits the RIP of 
the partial DFT matrix. In (87), a novel convex optimization approach is proposed to numerically determine 
the near-optimal tone-injection solution. Generally, tone-injection [4| is an effective approach to mitigate 
PAPR problem without incurring bandwidth loss. However, due to its computational complexity, finding the 
optimal tone-injection becomes intractable for OFDM systems with a large number of subcarriers. Therefore, 
a semi-definite relaxation needs to be adopted in the convex optimization |[88l . Moreover, based on the 
observation that only a small number of subcarrier symbols are usually moved, Iq minimization is required 
and naturally it can be relaxed to l\ minimization similar to compressed sensing literature. 

One of the popular solutions to PAPR reduction is clipping the amplitude of the OFDM signal although 
the clipping increases the noise level by inducing a clipping noise. Due to the sparsity of the clipping 
noise, compressed sensing can be used to recover and cancel the clipping noise. Before the clipping noise 



cancellation schemes using compressed sensing appear, some foundations of them have been presented. An 
impulse noise cancellation system using sparse recovery is firstly proposed in |89|. In practical systems, there 
exists a set of null tones not used for information transmission, which is exploited as measurements to estimate 
the impulse noise in time domain at the receiver. As an extension to [89 1, an alternative recovery algorithm 
with low complexity is proposed in |90|, which exploits the structure of DFT matrix and available a-priori 
information jointly for sparse signal recovery. In J9T), the work in l89l is extended to the case of bursty 
impulse noise whose recovery is based on the application of block-based compressed sensing. Secondly, a 
clipping noise cancellation scheme using frame theory is proposed in [92]. Although this scheme uses not 
compressed sensing but frame expansion, the frame expansion can be viewed as a special case of compressed 
sensing problem with known positions of nonzero elements. Some additional reserved tones not including 
data are padded and they are used as the measurements to recover the clipping noise at the receiver. 

Motivated by the above works, clipping noise cancellation schemes using compressed sensing have been 
proposed in ||93ll and its extended version in ||94ll . In [93 1, [94|, M reserved tones are allocated before clipping 
at the transmitter and they cause some data rate loss. These reserved tones can be exploited as measurements 
instead of null tones in |89|, [90|, [91], ||95l . Let us denote the transceiver model in frequency domain with 
clipping noise as 

Y = H{C + D) + Z (12) 

where C and Y are N X 1 transmitted and received tone vectors, respectively, H is a diagonal matrix of 
the channel frequency response, D is N x 1 clipping noise vector, and Z is AWGN vector. Starting from 



( 12 1, we equalize the channel by multiplying with H 1 and select the rows whose indices correspond to 
locations of the reserved tones by multiplying with a M X N row selection matrix S r . This results in 

S r H X Z, (13) 




where F is the DFT matrix and D = Fd. As seen in ( 13 I, the clipping noise on the reserved tones is used as 



measurements to recover the clipping noise d in time domain by sparse recovery algorithm. Additionally, in 
[94 1, a method exploiting a-priori information together with weighted li minimization for enhanced recovery 
followed by Bayesian techniques is proposed. However, the performance of 11931 and 11941 is restricted due 
to weakness of the compressed sensing against noise. 

In [96 1, more enhanced clipping noise cancellation scheme using compressed sensing is proposed. Different 
from [93 1 and [94], the scheme in |96l does not cause data rate loss, because it exploits the clipping noise 
in frequency domain as measurements underlying in the data tones rather than the reserved tones. In this 
case, transmitted data and clipping noise are mixed in the data tones. To distinguish the clipping noise from 



the data tones well, this scheme exploits part of the received data tones with high reliability. To (12 1, we 
multiply H 1 and row selection matrix S d , selecting the locations of reliable data tones, as 

S d H l Y = S d C + S d D + S d H- x Z. (14) 



Then, we estimate the S d C and subtract them from ( 14 1 as 



S d H L Y - S d C = S d FJL^+S d (C - C) + S d H L Z. (15) 

9 * / 

Then, from partially extracted clipping noise in frequency domain, we can recover the clipping noise d in time 
domain via sparse recovery algorithms. Furthermore, this scheme can adjust the number of the measurements 
M by changing the reliability of received data. Therefore, when there is AWGN noise, we can select the 
optimal number of measurements corresponding to the noise amount. Consequently, this scheme successfully 
realizes the clipping noise cancellation scheme by overcoming weakness of the compressed sensing against 
noise. Additionally, in [96 1, clipping noise cancellation for orthogonal frequency-division multiple access 
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Fig. 8. BER performance in OFDM AWGN channel using compressed sensing based clipping noise cancellation schemes as described 
in Al-Safadi et al |93j and Kim et al I96I . In I96I 23 out of 64 QPSK subcarriers are used for recovery of the 5 = 4 sparse clipping 
noise. In |93] 29 tones are selected by a (59, 29, 14) difference set and reserved. Baseline performance is the original OFDM signal with 
no clipping. It can be seen that provided the AWGN is not too strong, sparsity in the clipping noise can lead to significant performance 
gains of more than 3 dB given an error probability of 10 -6 . 



(OFDMA) systems is also proposed using compressed sensing. The fast Fourier transform (FFT) block of 
OFDM systems can be decomposed into the small FFT blocks. And, the subset of rows in the small sized 
DFT matrix can also be used as a sensing matrix, which can be used to recover the clipping noise for OFDMA 
systems via sparse recovery algorithm. 

Fig. [8] shows the bit error rate (BER) over signal-to-noise ratio (SNR) performance of the clipping noise 
cancellation schemes based on compressed sensing described in [93 1 and [96 1 for OFDM signals over the 
AWGN channel. The S'-sparse clipping noise signal contaminates the original OFDM signal and the case of 
no clipping noise cancellation shows the worst BER performance among all schemes. In 11931 . the authors 
applied the compressed sensing technique to OFDM systems for the first time, but there is a benefit only for 
the high SNR region due to weakness of compressed sensing recovery against AWGN. The BER performance 
of the scheme in [96 1 is better because the number of the measurements can be adjusted corresponding to 
the AWGN level. 

VIII. Conclusions 

Despite two decades of intensive research the PAPR problem remains one of the major problems in 
multicarrier theory with huge practical impact. This article provides a fresh look on this problem by outlining 
a new perspective using alternative metrics (including MIMO and multiuser systems as a special case), the 
corresponding theoretical foundations and related designs. This is followed by thorough discussion of current 
limits and new future directions. 
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